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Abstract Over the period of 6 years and three phases, the SEE-GRID programme 
has estabhshed a strong regional human network in the area of distributed scien- 
tific computing and has set up a powerful regional Grid infrastructure. It attracted 
a number of user communities and applications from diverse fields from countries 
throughout the South-Eastern Europe. From the infrastructure point view, the 
first project phase has established a pilot Grid infrastructure with more than 20 
resource centers in 11 countries. During the subsequent two phases of the project, 
the infrastructure has grown to currently 55 resource centers with more than 6600 
CPUs and 750 TBs of disk storage, distributed in 16 participating countries. Inclu- 
sion of new resource centers to the existing infrastructure, as well as a support to 
new user communities, has demanded setup of regionally distributed core services, 
development of new monitoring and operational tools, and close collaboration of 
all partner institution in managing such a complex infrastructure. In this paper 
we give an overview of the development and current status of SEE-GRID regional 
infrastructure and describe its transition to the NGI-based Grid model in EGI, 
with the strong SEE regional collaboration. 
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1 Introduction 

The transition of the traditional science to e-Science is fueled by the ever increas- 
ing need for processing of exceedingly large amounts of data and exponentially 
increasing computational requirements: in order to realistically describe and solve 
real-world problems, numerical simulations are becoming more detailed, exper- 
imental sciences use more sophisticated sensors to make precise measurements; 
and shift from the individuals-based science work towards collaborative research 
model now starts to dominate. 

Computing resources and services able to support needs of such a new model of 
scientific work are available at different layers: local computing centers, national 
and regional computing centers, and supercomputing centers. The gap between 
the needs of various user communities and dispersed computing resources able to 
satisfy their requirements is effectively bridged by introduction of Grid technology 
on the top of the networking layer and local resource management layers. 

Computing Grids are conceptually not unlike electrical grids. In an electrical 
grid, the wall outlets allow us to link to and use an infrastructure of resources, 
which generate, distribute, and bill for electrical power. When we connect to the 
electrical grid, we do not need to know details on the power plant currently gener- 
ating the electricity we use. In the same way Grid technology uses middleware layer 
to coordinate and organize into one logical resource a set of available distributed 
computing and storage resources across a network, allowing users to access them in 
a unified fashion. The computing Grids, like electrical grids, aim to provide users 
with easy access to all the resources they need, whenever they need them, re- 
gardless of the underlying physical topology and management model of individual 
clusters. 

Grids address two distinct but related goals: providing remote access to infor- 
mation technology (IT) assets, and aggregating processing and storage power. The 
most obvious resources included in Grids are processors (CPUs) and data storage 
systems, but Grids also can encompass various sensors, applications, and other 
advanced types of resources. One of the first commonly known Grid initiatives 
was the SETI@HOME project, which solicited several millions of volunteers to 
download a Screensaver, which was able to use idle processor time to analyze the 
astronomical data in the search for extraterrestrial life. 

In the past 6 years the European Commission has funded, through a number of 
targeted initiatives, activation of new user communities and enabling collaborative 
research across a number of fields in order to close existing technological and 
scientific gaps. In addition, this helps in bridging the digital divide, stimulating 
research and consequently alleviating the brain drain in the less-developed regions 
of Europe. This was especially successful in the South- Eastern Europe (SEE), 
where a number of such initiatives show excellent results. In the Grid arena, the 
South-East European GRid einfrastructure Development (SEE-GRID) series of 
projects ^","2], through its first two 2-year phases, has established a strong human 
network in the area of scientific computing and has set up a powerful regional 
Grid infrastructure, attracting large number of applications from diverse fields 
from countries throughout the South-Eastern Europe. The third 2-year phase of 
the SEE-GRID programme, SEE-GRID-SCI [3^ project, has aimed and succeeded 
in having a catalytic effect on a number of SEE user groups, with a strong focus 
on the key seismological, meteorological, and environmental communities. 
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One of the main successes of tlie SEE-GRID programme is cumulative struc- 
turing effort on the establishment of National Grid Initiatives (NGIs) in SEE 
countries and collaborative work on achieving sustainable model of operation, 
supported strongly from national funding sources. The regional SEE-GRID initia- 
tive has also supported and coordinated a successful transition of all SEE countries 
from the centralized operations model to the NGI-based EGI infrastructure, which 
is clearly visible from the participation of all partner countries in the 4-year EGI- 
InSPIRE project [Aj. 

2 Resource Centers 

The regional Grid infrastructure operated by SEE-GRID-SCI project was built on 
top of the pilot infrastructure established by the first SEE-GRID project (2004- 
2006), which was since then substantially extended and enlarged in terms of re- 
sources and number of Grid sites, and upgraded in terms of the deployed middle- 
ware and core services provided to existing and new user communities during the 
SEE-GRID-2 project (2006-2008). 

The operations activity adopted the pragmatic model of the 2-layered infras- 
tructures in which mature sites were migrated to the EGEE production in- 
frastructure, while the start-up sites from new institutes and user communities 
were incubated within the SEE-GRID infrastructure until they were ready to fol- 
low the requirements of the full-scale production infrastructure. In this way, both 
SEE-wide and national-level applications were able to benefit from the computing 
resources of both infrastructures, by mainly using the pilot infrastructure in the 
incubation phase and production infrastructure later, when they reach the pro- 
duction phase. Moreover, this approach ensured that smaller sites, typical for the 
region, have a chance to be a part of the regional SEE-GRID infrastructure acting 
as an incubator for their maturing into EGEE production. 

As applications developed in the region have matured, new Virtual Organiza- 
tions (VOs) have spun off with the relevant core services supported by the SEE- 
GRID-SCI operations activity SAl. Discipline-specific services were deployed in 
multiple instances (for failover and for achieving load-balancing through a wide ge- 
ographic distribution) over the e-Infrastructure and operationally maintained and 
supported by SAl. Sophisticated operational tools, some of them being developed 
within the joint research activity JRAl of the SEE-GRID-SCI project, were used 
to enhance infrastructure performance. 

SEE-GRID-SCI project has continued to operate and further extend, develop 
and improve this infrastructure, with the aim to cater for the needs of all activated 
user communities in the region, with special emphasis on the three identified target 
areas: meteorology, seismology, and environmental sciences. Apart from comput- 
ing and storage resources made available to these user communities, SAl activity 
provided and maintained a set of existing and new operational and monitoring 
tools so as to ensure proper operation of the infrastructure, and a set of primary 
and secondary core services for all deployed VOs in order to ensure optimal geo- 
graphical distribution according to the underlying network structure, load sharing, 
and quality of the service to end users. 

Currently SEE-GRID-SCI infrastructure encompasses approximately 55 Grid 
sites, more than 6600 CPUs, and around 750 TBs of available data storage ca- 
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Fig. 1 Overview of the SEE-GRID-SCI infrastructure. 



pacity, which is ihustrated in Fig.[Tl with further details given in Table [T] Overall 
number of CPUs has grown from 2400 at the beginning of the SEE-GRID-SCI 
project in May 2008 to currently more than 6600, while the number of dedicated 
CPUs for SEE-GRID-SCI VOs is currently around 1500. Grid operations activ- 
ity successfully maintains such a large, geographically disperse and ever-growing 
infrastructure, harmonizing its operation with the pan- European EGEE/EGI in- 
frastructure. In addition to this, one of the most important achievements of SAl 
activity is transfer of knowledge and Grid know-how to all participating countries, 
and support to their NGI operation teams to reach the level of expertise needed 
for sustainable NGI-based operational model in EGI. 

After the completion of the SEE-GRID-SCI project in April 2010, the re- 
gional Grid infrastructure was seamlessly integrated to the EGI infrastructure, 



Table 1 SEE-GRID-SCI computing and storage resources. 



Country 


Total number of CPUs 


Total storage [TB] 


Greece 


1200 


66.8 


Bulgaria 


1210 


42.3 


Romania 


120 


4.0 


Turkey 


2380 


528.0 


Hungary 


8 


2.0 


Albania 


34 


1.3 


Bosnia-Herzegovina 


80 


1.1 


FYR of Macedonia 


80 


4.1 


Serbia 


974 


97.0 


Montenegro 


40 


0.6 


Moldova 


24 


6.5 


Croatia 


44 


0.2 


Armenia 


424 


0.2 


Georgia 


16 


0.1 


Total 


6634 


754.2 
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and continues to support all deployed Virtual Organizations (VOs) and appli- 
cations developed during the 6-year SEE-GRID programme. The strong human 
network remains in place and still supports on-going transition of all countries to 
independent NGI operations through the SEE Regional Operations Centre. The 
catch-all SEE-GRID Certification Authority will continue its operation until all 
the countries from the region deploy their own national certification authorities. 
In terms of Grid operations, currently almost all (with only a few exceptions) NGI 
operations teams and infrastructures are fully validated by EGI teams, while vali- 
dation for the remaining SEE countries is expected to finish within a few months, 
i.e. by mid-2011. 

3 User Communities 

The core objective of the SEE-GRID-SCI project was to engage user communities 
from different regional countries in close collaboration. This strategy had a struc- 
turing effect for crucial regional communities. The target applications were selected 
from core earth science disciplines in the region, namely, seismology, meteorology 
and environmental protection. Thus, the focus of the project was to engage these 
three core cross-border communities in the research fields crucial for the region, 
structured in the form of Virtual Organizations (VO): 

— Seismology VO had six applications j6ir7l[8l[9l ll0llll l[T2] ranging from Seismic 
Data Service to Earthquake Location Finding, from Numerical Modelling of 
Mantle Convection to Seismic Risk Assessment. 

— Meteorology VO, with two large-scale applications |13lll4l[T5lll6lll7l[l8lll9l[20l 
121] , follows an innovative approach to weather forecasting that uses a multi- 
tude of weather models and bases the final forecast on an ensemble of weather 
model outputs. The other problem tackled within this VO is the reproduc- 
tion/forecasting of the airflow over complex terrain. 

— Environmental Protection VO supports eight applications ;22:23','24| [25l[26l[27[ 
[28l 29 30,, 31j,32j focusing on environmental protection/response and environment- 
oriented satellite image processing. 

In the Seismology VO, the work was organized around the development of 
Seismic Data Server (SDS) application services, providing distributed storage and 
serving of seismic data from different partner countries, logical organization and 
indexing of distributed seismic data, and programming tools (called iterators) 
that provide easy access to seismic data. In terms of applications, the focus was 
on gridification of five seismology applications from different South-eastern Euro- 
pean countries: Seismic Risk Assessment (SRA), Numerical Modeling of Mantle 
Convection (NMMC3D), Fauff Plane Solution (EPS), Earthquake Location Find- 
ing (ELF) and Massive Digital Seismological Signal Processing with the Wavelet 
Analysis (MDSSP-WA). 

In the Meteorology VO, with the aim to contribute to the improvement of 
the forecasts in the Mediterranean, among other techniques, the regional ensem- 
ble forecasting technique has been explored in the frame of the SEE-GRID-SCI. 
Indeed the regional ensemble forecasting system built over the Mediterranean, 
involves the need of large infrastructure that was not easily available at medium- 
scale research centres and institutions. For that reason, the Grid infrastructure 
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Fig. 2 The distribution of the size of three target user communities (left, number of end users 
per VO), and the distribution of the computing resources used by VOs (right). 

was explored for its ability to support the high CPU and storage needs of such a 
regional ensemble forecasting system. This application allowed the meteorological 
entities participating in the project to assess the probability of a particular weather 
event to occur. This information is being made freely available (to the participants 
and to the general public, etc) through the project web page, helping thus when 
needed, to make the necessary decisions based on this probabilistic information. 
In addition, another set of applications permitted the entities participating in the 
project to improve the quality of the understanding and forecasting of the airflow 
over regions characterized by the complex terrain. Further an important benefit of 
this application is the possibility offered to use this model for operational weather 
forecasting. Operational weather forecasting model chains based on this model 
have been developed in the frame of this project over Bosnia and Herzegovina, 
Armenia and Georgia. This is considered as an important benefit for the meteo- 
rological services of the aforementioned countries that did not have up to now the 
infrastructure support to run operationally weather forecasting models for their 
region. 

The Environmental VO has dealt with several important problem areas in the 
domain of environmental modeling and environmental protection and the applica- 
tions developed within the VO advanced the scientific knowledge and affected the 
policy and decision-making process, responding to the EU directives and national 
priorities. New modeling techniques and algorithms were employed in several of 
the applications, using the power of the Grid in order to increase the spatial and 
temporal resolution and obtain more adequate representation of the natural pro- 
cesses under investigation. In other applications, established techniques were used, 
combined with filters and scripts developed by the project partners in order to ac- 
commodate these systems to the specifics of the Balkan region. The beneficiaries of 
the systems developed during the projects lifetime include not only environmental 
scientists, but also the relevant governmental and international organizations, for 
example the international air quality monitoring bodies. By employing the Grid 
to increase the resolution these applications are now starting to target new bene- 
ficiaries like municipal authorities, small and medium enterprises and media. For 
many of the applications the validation of the models and standardizing the com- 
putational processes has been an important achievement, since the methodological 
aspect of these studies was a challenging one, especially in the Balkan region. 

Fig. [2] gives some details on the size of activated user communities, and the 
distribution of computing resources they have utilized during the project lifetime. 
Overah, during the period 2008-2010, SEE-GRID-SCI project has provided more 
than 22.5 million elapsed CPU hours or 2566 CPU years, and more than 4.5 million 
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jobs were executed on the regional infrastructure. Out of this, SEE-GRID-SCI and 
national VOs amounted to 16.4 miUion CPU hours or 1872 CPU years (73%). The 
total utilization of dedicated resources (based on the average number of 1050 
available CPUs) was quite high, around 89%, and this has attracted the growth of 
supported user communities, and enabled them to achieve the enormous amount 
of new scientific results, as can be seen by the large number of scientific papers 
published in per-reviewed research journals f33, 34, 35, 36,37,38 and presented at 
numerous scientific conferences t6,7,8,9, ,10 . ,ll . ,12, 13 , 14, 15, 16, 17, 18, 19.20, 21ii22l 
[23l[2il[25l[26ll27ll28ll29ll30l[3n i32] . The project itself has organized SEE-GRID-SCI 
User Forum in December 2009, where the most significant results were presented. 

4 Core Services 

To operationally provide computational and storage resource to the three target 
scientific communities supported by the SEE-GRID-SCI project, three different 
VOs have been created: METEO, SEISMO, and ENV VO. The support for these 
VOs, as well as to the catch-all SEEGRID VO, has been configured on all Resource 
Centres participating in the regional infrastructure, and a set of core services was 
installed and deployed by SAl activity, as illustrated in Fig. [S] 




Fig. 3 Geographical distribution of core services. 



For each VO a primary and secondary VO Management Service (VOMS) has 
been deployed and maintained by institutes involved in the corresponding VO 
application development. Additionally, a set of core Grid services was deployed 
in order to support job management operations (Workload Management Sys- 
tem - WMS, Logging and Bookkeeping - LB), Grid information system (Berkeley 
Database Information Index - BDII), data storage and transfer operations (Logical 
File Catalog - LFC, File Transfer Service - FTS, ARDA Metadata Grid Application 
- AMGA), and management of digital credentials (MyProxy - PX). Deployment 
details of primary core Grid services are given in Table [2] 
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Table 2 List of primary core services deployed per VO. 



Service 


METEO VO 


ENV VO 


SEISMO VO 


VOMS 


voms.grid.auth.gr 


voms . ipp . acad . bg 


voms.ulakbim.gov.tr 


WMS & LB 


wms.ipb.ac.rs 


wms.ipp.acad.bg 


wms.ulakbim.gov.tr 


BDII 


bdii.ipb.ac.rs 


bdii.ipp.acad.bg 


bdii.ulakbim.gov.tr 


LFC 


grid02.rcub.bg.ac.rs 


lfc01.mosigrid.utcluj.ro 


lfc.ulakbim.gov.tr 


FTS 


gridl6.rcub.bg.ac.rs 




fts.ulakbim.gov.tr 


AMGA 


gridl6.rcub.bg.ac.rs 




amga.ulakbim.gov.tr 


PX 


my proxy, ipb . ac . rs 


my proxy, ipp . acad . bg 


myproxy.ulakbim.gov.tr 



5 Grid operations 

This section gives brief description of operational procedures and key tools de- 
veloped during the course of the SEE-GRID programme. In addition, a number 
of operational tools have been developed, improved and deployed by the SEE- 
GRID-SCI SAl activity and used in day-to-day infrastructure management, as 
illustrated in Fig.[31 Table [3] lists all currently deployed tools including those used 
for monitoring of the infrastructure, some of which are described in more detail 
in the next section, while Fig. 3] gives their geographical distribution, as well as 
distribution of responsibilities for their deployment and maintaining. The inter- 
actions and collaboration on the development and usage of described tools with 
other Grid initiatives/projects are emphasized wherever applicable. 

Recognizing that improvements in the quality and shaping-up of the SEE- 
GRID infrastructure are an important and continuous effort, necessary for the 
successful work of SEE-GRID application developers, as well as for the usage of 
our infrastructure by the existing user communities, the pro-active monitoring of 
Grid sites in the region was organized through rotating shifts by SAl country 
representatives (Grid Infrastructure Managers - GIMs). During each shift, the 
corresponding GIM is designated as Grid-Operator-On-Duty, or GOOD ^39j. 

Basically, the idea is that each GIM (i.e. GIM team from one country) is on 
duty during one week overseeing the infrastructure and opening trouble tickets in 




Fig. 4 Geographical distribution of SEE-GRID operational and monitoring tools. 
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Table 3 Deployment of operational and monitoring tools in the SEE-GRID infrastructure. 



Service 



Service URL 



HGSM 
BBmSAM 
BBmobileSAM 
Gstat 

Accounting Portal 

Nagios 

Googlemap 

MonALISA 

Real Time Monitor 

WatG Browser 

WMS Monitoring Tool 

Repository Service 

Dwarf 

Grid-Operator-On-Duty 
Helpdesk 
SEE-GRID Wiki 
P-Gradc Portal 



https:/ /hgsm. grid. org.tr/ 

https: / /cOl. grid. etfbl.net/bbmsam/ 

https: / /cOl. grid. etfbl.net/bbmsam/mobile.php 

http: / / gstat . gridops . org/gstat / seegrid / 

http: / / gserv4. ipp. acad. bg: 8080 / AccountingP ortalT] 

https: / / portal.ipp.acad.bg: 7443 / seegridnagios / 

http: / / www.grid.org.tr / cng/ 

http: / / monitor. seegrid. grid. pu b. ro:8080/l 

http:/ /gridportal.hcp.ph.ic.ac .uk/rtm/applet.html| 
http: / /watgbrowscr.scl.rs :80807] 

http://wmsmon.scl.rs/ 

http://rpm.egee-see.org/yum/SEE-GRID7] 



https: / /dwarf.scl.rs^^ 

http:/ /wiki . cgee-see.org/inde x.php / SG-GOOP] 

http://hclpdesk.soe-grid.eu/ 

http://wiki.cgec-scc.org/indcx.php /SEE-GRID-Wikil 
http: / /portal. p-gradc.hu/multi-gri J] 



the SEE-GRID Helpdesk to sites from all countries where operational problems are 
identified using the available monitoring tools. Of course, all GIMs are expected 
to continually monitor and provide support to sites from their countries - this 
is their day-to-day duty, in addition to regular regional GOOD shifts. Details of 
the organization of GOOD shifts are given at the SEE-GRID Wiki [iOlIlT] . For 
problems identified by GOODs, trouble tickets were created in the SEE-GRID 
Helpdesk [42_, and site managers were expected to deal with such operational 
problems and provide feedback on the steps taken. Typically, simple problems were 
resolved within one working day, while for more complex issues typical resolution 
time was up to three working days. 

On the request of applications which need MPI support on sites, GOODs are 
expected to test MPI setup on aU SEE-GRID sites which support MPI. The MPI 
setup tests are performed at least once a week, and GOODs ensure that the test 
parallel jobs run at the same time on at least two WNs (to test ssh setup as well). 
More details can be found on the Wiki page on Testing MPI support [30j . 

In this section we describe two selected tools used for Grid operations: HGSM 
database (used for maintaining the database of Grid resources and personnel). 
Dwarf portal related to software development and repositories (especially impor- 
tant in maintaining updated of Grid middleware and application software). 



5.1 HGSM 

Hierarchical Grid Site Management - HGSM [43J is a web based management 
application primarily geared towards Grid site administrators. At the beginning it 
was designed to store static information about Grid sites and personnel responsible 
for the sites, but later it evolved to the central information hub, also used for other 
Grid monitoring and checking services. 

The idea behind the HGSM is to refiect the natural hierarchy present in the 
infrastructure. For each supported infrastructure, HGSM has a ROC (Regional 
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IGSM (Hierarchical Grid Site Management) 
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Fig. 5 Overview of HGSM portal. 



Operational Centers) associated with it at the top. These ROCs contain the coun- 
tries that participate in a particular infrastructure. Grid sites of each country are 
listed under the respective country tree, and all details related to a specific Grid 
site can be viewed under the respective site entry in the web front end of HGSM, 
Fig. [5l The management personnel information is also stored for each organiza- 
tional level (ROC, country, site), containing contacts with both administrative and 
management privileges. 

While HGSM holds vast information about Grid sites and core services, it 
also contains personal information for named contacts (names, e-mail addresses 
and phone numbers). To properly protect this information, HGSM uses a digital 
certificate-based authentication system. HGSM server only authorizes people with 
a valid Grid certificate to view the information in HGSM web front-end. Editing 
information is only allowed to authorized personnel with administrative privileges. 
The authorization is organized in a hierarchical manner, so that an administrator 
at the higher level can manage every aspect (including the administrators) at lower 
organizational levels. 

HGSM has already been used by communities and projects other than SEE- 
GRID, e.g. by the Deployment of Remote Instrumentation Infrastructure - DORII 
project [33], as well as by the Spanish and Portuguese NGIs [45] . 



5.2 Dwarf 

Web-based Dwarf tool is composed of the Dwarf web portal [46 , Dwarf modules 
and Dwarf database. Using the Public Key Infrastructure (PKI), Dwarf framework 
provides digital certificate-based management of RPM uploading and creation of 
APT and YUM repositories. The Dwarf web portal home page, shown in Fig. (6] 
gives an overview of repository structure together with information on the context 
of each repository, and latest build's timestamp. 

From the Dwarf web portal, properly authenticated and authorized user can 
perform the following operations on the repository: 
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— Create and change repository structure: Users can create paths to new distri- 
butions and components, by specifying their names. In the current implemen- 
tation of the Dwarf framework, the users are able to create APT and YUM 
repositories, as well as to create a MIRROR to an existing remote repository. 

— Package uploading: Users can upload different software packages, but only to 
sections of the repository for which they are authorized as contributors. 

— Build repository: After each RPM upload, a user should re-build the repository 
structure. If not. Dwarf system will do it automatically, through a cron job. 

Dwarf modules are implemented as bash scripts which handle appropriate build 
actions on various repositories. 

After a new APT repository structure is created from the Dwarf web portal, 
the RPMs must be indexed to create the APT database. This is done by the APT 
Dwarf module, which uses the genbasedir tool for this purpose. It analyzes RPM 
packages in a directory tree and builds information files so that that directory tree 
can be used as a proper APT repository. 

The Dwarf database contains information on security (authentication and au- 
thorization), repositories types and metadata, mirror repositories, and logging in- 
formation. Dwarf database contains metadata repository information on builds 
timestamps, contexts, and descriptions of the repositories, as well as repository 
types. The rules for creation of mirror repositories are also kept in the Dwarf 
database. In addition, for security and auditing reasons, the database contains a 
log of all user-initiated actions. The Dwarf database is realized using the MySQL 
database technology. 

Once the repository is constructed, it is made available by HTTP and FTP 
servers configured and working on the Dwarf web portal. The DWARF framework 
provides configurations that should be included in the local HTTP and FTP servers 
configuration files in order to provide the context of repositories. 
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Fig. 6 Overview of the SEE-GRID Dwarf web portal. 
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6 Monitoring of SEE-GRID infrastructure 

The monitoring of tiie iieterogeneous and widely geograpiiically dispersed Grid 
infrastructure is an essential task for achieving the required quality of service to 
supported user communities. This has been defined through the SEE-GRID Service 
Level Agreement (SLA), which has served as a prototype for the later adopted 
EGEE SLA. The monitoring of the performance of sites is not only used for formal 
assessment of the conformance to SLA, but also for day-to-day Grid operations, 
since various monitoring tools provide the main channel for identification and 
diagnostics of operational problems by Grid Operators on Duty and GIMs. The 
most important such tools are listed in Table |3l and we briefly describe them in 
this section. To illustrate how the conformance of availabilities of Grid services 
to the adopted SLA was monitored and assessed. Fig. [7] gives overview of the 
availability monitoring results for the second year of the SEE-GRID-SCI project 
(May 2009 to April 2010). Using the BBmSAM tool (described below), precise 
measurement of the availability of all services was systematically done, and detailed 
results were provided at different levels or granularity: per service, per site, per 
country, and per SEE-GRID infrastructure. For example, the overall availability 
of resources (weighted by the CPU number of individual clusters) for the last four 
quarters increased from around 78% in Q5 (May - July 2009) to around 89% in Q8 
(February - April 2010). Strict enforcement of SLA lead to a steady increase in the 
availability and reliability of Grid services offered to our target user communities. 

6.1 BBmSAM 

Availability monitoring of the infrastructure is carried out using the Service Avail- 
ability Monitoring - SAM [37] framework developed in EGEE project |5], which is 
further developed and extended by SEE-GRID series of projects and deployed by 
its SAl activity. The original SAM system consists of server and client components 
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Fig. 7 Overview of availability of Grid services within the SEE-GRID-SCI infrastructure per 
quarter in the second (final) year of the project (May 2009 to April 2010). 
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which communicate using web services. The client initiates periodical tests of the 
infrastructure and publishes data to the server which stores them in the Oracle 
database. Main change in the SAM framework in its adaptation for the SEE-GRID 
community was its porting to MySQL, suitable for the deployment in the region 
and in line with the SEE-GRID open source philosophy. 

BBmSAM |48ll49l[50] Platform is a web application coded in PHP and using 
the MySQL Database as data storage back-end (although any standard-compliant 
SQL database server could be used, since it does not rely on any of MySQL- 
specific features) . It has been tested under Apache HTTPD and Microsoft IIS web 
servers, and should work with any web server supporting PHP (at least through 
CGI). Main features of BBmSAM system are: 

— Use of unaltered client and sensor components of EGEE SAM system. 

— Synchronization with central HGSM service. 

— Use of free and open source technologies. 

BBmSAM client and sensors are the same as ones used in the standard EGEE 
SAM distribution, and they operate in identical way. In designing BBmSAM por- 
tal and dependent web services, special care was taken so that the solution would 
be compatible with EGEE/EGI tools and practices. This was achieved by imple- 
menting the same web services in PHP/MySQL implementation as the ones used 
in the original Java/Oracle-based SAM. 
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Fig. 8 Overview of the SEE-GRID BBmSAM web portal. 
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Main part of the BBmSAM web front-end, shown in Fig. [8l is a summary of 
current results for all tested Grid sites, containing site names, countries and other 
relevant details for each service. 



6.2 SEE-GRID Accounting Portal 

Accounting Portal [611 is a web-service based utility to collect and statistically 
present information on the CPU accounting data for the SEE-GRID computing 
resources. Its main purpose is to collect and manage accounting data for the sites 
in SEE-GRID infrastructure. Recently a new publisher was released, capable of 
collecting and processing data for parallel MPI jobs, which are not properly ac- 
counted for when using the standard publisher provided by gLite. The accounting 
processing structure is based on two services: MPI log parser and accounting pub- 
lisher. The MPI log parser tool processes PBS Server logs and inserts the data 
on MPI jobs in the MPI accounting database on the MON node. Afterwards, the 
accounting publisher aggregates the data from the standard accounting database 
and MPI database and sends it to the central accounting portal database. The 
publisher is based on an independent module architecture which allow the two 
modules (MPI and standard serial) to work independently, so that sites that do 
not support MPI can use the same publisher. 

New web front-end interface of the accounting portal (Fig. |9]) is created to 
dynamically generates account statistics and charts. It is written in Adobe Flex 
and Java and implements the MVC design pattern. The View module of the portal 
is written in Flex, offering an interactive environment with dynamic visualization 
of the accounting data managed in tables, bar and pie charts. The Interface module 
is a Java web service which accepts input parameters such as data type, job type, 
period, rows and columns for the tables. In addition, it is capable of filtering the 
data by VO, country and site, offering more flexible data organization. It can also 




Fig. 9 Overview of the SEE-GRID accounting portal. 
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generate SQL queries based on the provided parameters and extract required data 
from the accounting database. The data are returned in XML form, suitable for 
import to a variety of other applications. The web portal is hosted on a web server 
running under Apache Tomcat with installed Apache Axis web-service framework. 

6.3 WatG Browser 

The What is at the Grid - WatG Browser [52] is a web-based Grid Information 
System (GIS) visualization application providing detailed overview of the status 
and availability of various Grid resources in a given gLite-based e-Lifrastructure. 
It is able to query and present data obtained from Grid information systems at 
different layers: from local resource information system for a particular Grid service 
(GRIS), to the Grid site information system (site BDII), and to the top-level 
information system for the whole Grid infrastructure (top-level BDII) . 

The efHcient implementation of WatG Browser allows quick and easy naviga- 
tion through entries and objects of the LDAP tree retrieved by the specified query, 
even if the size of the output is huge and hierarchically very complex. Highly re- 
sponsibility is achieved with implementation of partial refreshes and asynchoniza- 
tion of a web page. A partial refresh of WatG application can be observed when 
an interaction event is triggered, for example click on the plus icon of the LDAP 
tree. The server processes the information and returns a limited response specific 
to the data it receives, for example LDAP's subtree that requires given condition. 
One may notice that WatG server does not send back an entire page, like the con- 
ventional "click, wait and refresh" web applications. Instead, WatG client updates 
the page based on the response. This means that only part of the page is updated. 
In other words, WatG's initial page (Fig. llOp is treated like a template: WatG 
server and client exchange the data and the client updates parts of the template 
based on the data it receives from the server. Another way to think about it is 
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Fig. 10 Overview of WatG Browser. 
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to consider WatG application as driven by events and data, whereas conventional 
web applications are driven by pages. Asynchronization of the WatG application 
is reflected in the fact that after sending data to the server, the client can continue 
processing while the server does its processing in the background. During all this, 
a user can continue interacting with the client without noticing interruption or a 
lag in the response. For example, a user can click on any plus or minus icon even 
during the loading, and in that way a new request will be created and executed 
afterwards. The client does not have to wait for a response from the server before 
continuing, as is the case in the traditional, synchronous approach. The WatG 
Browser is deployed by SOL [53] and publicly available at the address given in 
Ref. [52]. 



6.4 WMS Monitoring Tool 

The complex task of computing resources discovery and management on behalf of 
user applications in the gLite Grid environment is done by the Workload Man- 
agement System (WMS) service. WMS monitoring tool WMSMon ^ provides 
reliable, site-independent, centralized, and uniform monitoring of gLite WMS ser- 
vices. 

WMSMon tool, developed and deployed by SCL [52, is based on the collector- 
agent architecture that ensures monitoring of all properties relevant for successful 
operation of gLite WMS service and triggering of the alarms if certain monitored 
parameter values exceed predefined limits. In addition, the tool provides links to 
the appropriate troubleshooting guides when problems are identified. 

WMSMon tool consists of two parts of software. The first one, WMSMon 
Agent, should be installed on all monitored WMS services, and locally aggregates 
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Fig. 11 Overview of WMSMon Portal. 



Development of Grid e-Infrastructurc in South-Eastern Europe 



17 



the values of all relevant parameters described in the previous section. The second 
component of WMSMon software is WMSMon Collector, installed on a specific 
machine equipped with the web server and gridFTP client, with the aim to collect 
the data from all WMSMon Agents and to provide web interface to the graphical 
presentation of the collected data. 

WMSMon web portal presents information from diverse WMS sources in a 
unified way, as can be seen in Fig. 1111 The main page provides the aggregated 
status view of all monitored WMS services from the target Grid infrastructure. 
This part of the portal presents the data in a simplified way, with the emphasis 
on WMS services identified not to work properly. The portal also provides links 
to pages with detailed information and graphs for each monitored WMS service. 
These pages contain the latest data, as well as historical data presented in the 
graphical form. 

In addition to the main WMSMon instance deployed by SCL [53], other in- 
stances of WMSMon are installed and used at Grid Operations Centre at CERN 
[55] and at NIKHEF [56]. 



7 SEE Involvement in High Performance Computing 

The Grid developments in the region, described in this paper, are currently be- 
ing complemented with supercomputing / High-Performance Computing (HPC) 
actions. The HP-SEE project [S7J (High-Performance Computing Infrastructure 
for South East Europes Research Communities) is currently work across several 
strategic lines of action. First, it is linking the existing HPC facilities in the region 
into a common infrastructure, and providing operational and management solu- 
tions for it. Second, it is striving to open this infrastructure to a wide range of new 
user communities, including those of non-resourced countries, fostering collabora- 
tion and providing advanced capabilities to more researchers, with an emphasis on 
strategic groups in computational physics, computational chemistry and life sci- 
ences. Finally, it acts as a catalyst for establishment of national HPC initiatives, 
and will act as a SEE bridge for DEISA [58] , also presented in this edition, as well 
as PRACE [59] infrastructure. 

Fig. [12] depicts the multi-dimensional regional einfrastructure in South-East 
Europe, where HP-SEE effectively adds the new Research Infrastructure: HPC 
infrastructure and knowledge / user layer, on top of the existing network plane, 
and parallel to the existing Grid plane, thus optimising all layers and further 
enabling a wide range of new cross-border eScience applications to be deployed 
over the regional einfrastructure. This approach effectively creates an integrated 
einfrastructure for new virtual research communities, and provides a platform for 
collaboration between ICT engineers and computational scientists dealing with the 
infrastructure on one hand, and on the other the scientists from diverse scientific 
communities in the region. 

It should be noted that this vision will provide an integrated infrastructure, 
where Grid and HPC layers and not mutually exclusive but rather complementary, 
and tailored for the type of applications supported. Table 2] gives the overview of 
the current and planned HPC resources that will be available to the HP-SEE 
Virtual Research Communities within the project. 
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Fig. 12 SEE elnfrastructuro with HPC, and new user communities. 



The available resources will be integrated into a common infrastructure avail- 
able for the regional Virtual Research Communities. The current and planned 
HPC infrastructure is heterogeneous, comprising of BlueGene supercomputers, 
Intel/ AMD clusters and enhanced with GPU computing accelerators. Concerning 
the middleware deployments, we believe the upcoming Unified Middleware Dis- 
tribution, which will combine Unicore, gLite and ARC will be well suited for the 
regional HPC infrastructure, taking into account the current situation, where var- 
ious combinations of these middleware stacks with batch systems and workflow 
management systems exist. The regional HP-SEE infrastructure will be operated 
through the operations centre that will be established within the project, which 
will carry out analysis, requirements capture and evaluation, and deployment of 
the existing solutions for system management of the regional infrastructure; will 
identify missing components, and provide optimal solutions. Solutions by system 
vendors and successful developments from European projects, especially DEISA, 
and PRACE, will be taken into account. Wherever possible the existing solutions 
will be adapted and enhanced for deployment in the regional infrastructure. A set 
of operational tools will be deployed, including user administration, accounting, 
distributed data management, security, authentication and authorization, moni- 
toring of distributed resources, resource management and allocation, and helpdesk 
for user support. 



Table 4 Current and planned computing power (TFlops) by HP-SEE countries (double pre- 
cision for CPU and single precision for GPU). 
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The identified target user communities include computational physics, com- 
putational chemistry and life sciences. Computational physics is represented by 8 
applications from 6 countries and covers the fields of many-body condensed mat- 
ter physics, including modeling of electron transport, modeling of complex gas 
dynamics and convection, plasma physics and image processing. Computational 
chemistry community includes 7 applications from 6 countries, covering the fields 
of molecular dynamics and simulations, and materials science. Life sciences com- 
munity has 7 applications from 5 countries, covering the fields of computational 
biology, computational genomics, computational biophysics and DNA sequencing. 

8 Transition to EGI and Conclusions 

Over the period of 6 years and three phases, the SEE-GRID programme aimed at 
creating independent and sustainable NGIs in each country of the South-Eastern 
Europe. That has allowed all the countries to participate as fuU-fiedged mem- 
bers of the wider European Grid infrastructure realized through the series of 
EGEE projects and currently by the European Grid Initiative, EGI [T. EGI is 
established as a coordinating organization for the European Grid Infrastructure, 
based on the federation of individual NGIs, aiming to support a wide variety of 
multi-disciplinary user communities. To facilitate the above aim, the SEE-GRID 
programme has focused both at stimulating the support to policy makers as well 
as for creating sustainable operational structures in each of the countries in the 
region. 

In particular, on the policy level, the last two years of the SEE-GRID pro- 
gramme have focused on monitoring and improving the status of NGIs in partner 
countries, and providing support for their evolution and integration into the envi- 
ronment standardized by EGI, aiming to achieve sustainability as active partners 
in this new pan-European collaboration model. This effort resulted in one of main 
successes of the project, with all countries of the region currently members or asso- 
ciate members of EGI and participating as partners in the EGI-InSPIRE project. 

On the operational level, the focus of SEE-GRID was to create and increase the 
capacity of Grid resources in the region, create independent and stable operational 
structures, increase the availability of Grid resources, deploy core services in all 
countries of the region, as well as to develop geographically distributed network of 
Grid experts able to provide operational and application level support to end users. 
At the end of the 6th year of the SEE-GRID programme, all SEE countries are 
providing such an operational infrastructure for the local and international user 
communities from the pan-European EGI infrastructure, either as independent 
NGIs or as a part of the South-Easterrn Europe Regional Operations Centre. 

We describe bellow the procedure taken by most of the countries in the region 
in order to become fully independent operational NGIs from the technical point of 
view. The new NGIs use EGIs Grid Operations Database, GOCDB [BU to register 
their NGI management structure, sites and operational personnel. Most of the 
SEE NGIs base their operational portal on the central portal that is provided 
by EGI, performing operations via the NGI view that it offers. In cases like the 
Greek NGI, a standalone operational portal has been setup. During the course of 
the SEE-GRID projects the regional Helpdesk was based on OneOrZero as it has 
also been discussed in Section 4. The SEE-GRID Helpdesk by the end of the SEE- 
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GRID projects was fully integrated with GGUS and therefore it is a candidate 
system for NGIs, to use as their national Helpdesk solution integrated with the 
Global Grid User Support, GGUS [gT]. Further to that Request Tracker, RT [5^ has 
been integrated with GGUS and can offer the same functionality. Based on the 
above the NGIs of the regional can select which helpdesk solution to use (either 
directly GGUS, OneOrZero, or RT). Since infrastructure monitoring in EGI has 
moved from SAM to Nagios, all new NGIs install and operate their own instance 
of Nagios that integrates with the rest of EGIs monitoring systems. Finally, SEE 
NGIs use the Unified Middleware Distribution (UMD) as a central repository for 
installing basic middleware components while still use the regional repository or 
even some national repositories, for software packages that are tailored to specific 
needs of their countries and are not available in UMD. 

Towards the end of the SEE-GRID-SCI project (May 2010) aU NGIs of the 
project where migrated to EGEE/EGI via the SEE-ROC, utilizing the existing 
ROC infrastructure and services. Since May 2010 and up to now (January 2011) 
almost all the NGIs have migrated to the EGI operational model. The average 
time for an NGI to migrate its operational structure from SEE-ROC to EGI is 
between 1 and 3 months. 

The SEE-GRID programme had pivotal role in bridging the digital divide in 
the SEE region, in spearheading regional research collaborations, and in creating 
a strong human network in ICT field paving the way towards full integration of 
the region into the European Research Area (ERA) . This work continues with the 
HP-SEE project [57], that aims at bringing together the national HPC infrastruc- 
tures in the region of South Eastern Europe and the regional Virtual Research 
Communities of Computational Physics, Computational Chemistry and Life Sci- 
ences. Enabling of those user communities to get access to HPC resources for their 
scientific work is the prime goal of this new project, and demonstrates the suc- 
cess of SEE-GRID series of projects in involving scientists from the region in the 
development and production use of distributed research infrastructures. 
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